Skip to content

fix(llmobs): filter openai Omit/NotGiven sentinels from span metadata#18552

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
MLOS-693/filter-openai-omit-metadata
Jun 11, 2026
Merged

fix(llmobs): filter openai Omit/NotGiven sentinels from span metadata#18552
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
MLOS-693/filter-openai-omit-metadata

Conversation

@jessicagamio

Copy link
Copy Markdown
Contributor

Overview

The OpenAI integration captured the raw chat-completion request kwargs as LLM span metadata, including the openai SDK's Omit / NotGiven sentinel objects used as defaults for unset parameters. These were serialized to noisy repr strings such as "<openai.Omit object at 0x7f5e35900e90>" across most of the ~21 metadata keys per span, making the metadata field unqueryable and burying the real parameters.

Motivation

Frameworks like PydanticAI forward every chat-completion parameter explicitly, defaulting any the caller didn't set to openai.omit. ddtrace snapshots the kwargs at the wrapper boundary — upstream of the openai SDK's own sentinel-stripping (_merge_mappings) — so the sentinels reach span metadata even though they never reach the provider. The request to OpenAI/Azure was always correct; only the recorded metadata was polluted.

Change

Filter Omit / NotGiven sentinel values out before building metadata, in both:

  • get_metadata_from_kwargs (chat / completion)
  • openai_get_metadata_from_response (responses API)

Sentinel types are resolved lazily and independently via a small cached helper. Lazy resolution avoids a circular import while ddtrace is patching openai at import time; independent resolution keeps NotGiven filtering working on openai<2 (which has no Omit). On openai-less installs the helper returns an empty tuple and the filter is a no-op.

Testing

  • Added regression test test_chat_completion_filters_openai_sentinel_metadata — passes a real value (top_p) alongside Omit/NotGiven sentinels and asserts only the real value lands in metadata. Fails without the fix.
  • Verified across the full openai riot matrix (latest, <2.0.0, ~=1.76.2, ==1.66.0) — including 1.x, which exercises the NotGiven-only path.
  • Reproduced the customer's exact stack (PydanticAI → Azure OpenAI): metadata drops from 21 keys (~20 sentinels) to only real values.

Risk

Low. The change only removes sentinel placeholder values from metadata; real parameter values are unaffected. Behavior is unchanged for non-openai integrations.

Jira: MLOB-7613

@jessicagamio jessicagamio requested review from a team as code owners June 9, 2026 23:38
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 268c453 | Docs | Datadog PR Page | Give us feedback!

@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codeowners resolved as

ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
releasenotes/notes/llmobs-filter-openai-omit-metadata-15533386303440c7.yaml  @DataDog/apm-python
tests/contrib/openai/test_openai_llmobs.py                              @DataDog/ml-observability

@jessicagamio jessicagamio force-pushed the MLOS-693/filter-openai-omit-metadata branch 2 times, most recently from f04d054 to df1efab Compare June 10, 2026 03:00

@emmettbutler emmettbutler left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release note looks fine

@Yun-Kim Yun-Kim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with what we're fixing, just wonder if there's an easier way to do this

Comment thread ddtrace/llmobs/_integrations/utils.py Outdated
The OpenAI integration captured the raw request kwargs as span metadata,
including the openai SDK's `Omit`/`NotGiven` sentinel objects used as defaults
for unset parameters. These were serialized to noisy repr strings such as
"<openai.Omit object at 0x...>", making metadata unqueryable and burying the
real parameters.

Frameworks like PydanticAI forward every chat-completion parameter explicitly,
defaulting unset ones to `openai.omit`, which is why this surfaced broadly.

Filter both sentinel types out before building metadata, in both
`get_metadata_from_kwargs` (chat/completion) and
`openai_get_metadata_from_response` (responses API). Sentinel types are
resolved lazily and independently to avoid a circular import at patch time and
to keep `NotGiven` filtering working on openai<2 (which has no `Omit`).
@jessicagamio jessicagamio force-pushed the MLOS-693/filter-openai-omit-metadata branch from df1efab to 0bf1ef6 Compare June 10, 2026 21:55
Comment thread releasenotes/notes/llmobs-filter-openai-omit-metadata-15533386303440c7.yaml Outdated
…303440c7.yaml

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit faf70d1 into main Jun 11, 2026
584 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the MLOS-693/filter-openai-omit-metadata branch June 11, 2026 22:01
@github-actions

Copy link
Copy Markdown
Contributor

This change is marked for backport to 4.10 and it does not conflict with that branch.
The command used to test backporting was

git fetch origin 4.10 && git checkout origin/4.10 && git checkout -b backport--to-4.10 && git cherry-pick -x --mainline 1 faf70d11d6ddc328e190255f7d47a9601d7878b5

github-actions Bot added a commit that referenced this pull request Jun 11, 2026
…#18552)

## Overview

The OpenAI integration captured the raw chat-completion request `kwargs` as LLM span metadata, including the openai SDK's `Omit` / `NotGiven` sentinel objects used as defaults for unset parameters. These were serialized to noisy repr strings such as `"<openai.Omit object at 0x7f5e35900e90>"` across most of the ~21 metadata keys per span, making the metadata field unqueryable and burying the real parameters.

## Motivation

Frameworks like **PydanticAI** forward every chat-completion parameter explicitly, defaulting any the caller didn't set to `openai.omit`. ddtrace snapshots the `kwargs` at the wrapper boundary — upstream of the openai SDK's own sentinel-stripping (`_merge_mappings`) — so the sentinels reach span metadata even though they never reach the provider. The request to OpenAI/Azure was always correct; only the recorded metadata was polluted.

## Change

Filter `Omit` / `NotGiven` sentinel values out before building metadata, in both:
- `get_metadata_from_kwargs` (chat / completion)
- `openai_get_metadata_from_response` (responses API)

Sentinel types are resolved **lazily and independently** via a small cached helper. Lazy resolution avoids a circular import while ddtrace is patching openai at import time; independent resolution keeps `NotGiven` filtering working on `openai<2` (which has no `Omit`). On openai-less installs the helper returns an empty tuple and the filter is a no-op.

## Testing

- Added regression test `test_chat_completion_filters_openai_sentinel_metadata` — passes a real value (`top_p`) alongside `Omit`/`NotGiven` sentinels and asserts only the real value lands in metadata. Fails without the fix.
- Verified across the full openai riot matrix (latest, `<2.0.0`, `~=1.76.2`, `==1.66.0`) — including 1.x, which exercises the `NotGiven`-only path.
- Reproduced the customer's exact stack (PydanticAI → Azure OpenAI): metadata drops from 21 keys (~20 sentinels) to only real values.

## Risk

Low. The change only removes sentinel placeholder values from metadata; real parameter values are unaffected. Behavior is unchanged for non-openai integrations.

Jira: MLOB-7613

Co-authored-by: jessica.gamio <jessica.gamio@datadoghq.com>
(cherry picked from commit faf70d1)

Co-authored-by: Jessica Gamio <52049720+jessicagamio@users.noreply.github.com>
Yun-Kim pushed a commit that referenced this pull request Jun 12, 2026
… [backport 4.10] (#18592)

Backport #18552 to 4.10

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jessica Gamio <52049720+jessicagamio@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

This change is marked for backport to 4.11 and it does not conflict with that branch.
The command used to test backporting was

git fetch origin 4.11 && git checkout origin/4.11 && git checkout -b backport--to-4.11 && git cherry-pick -x --mainline 1 faf70d11d6ddc328e190255f7d47a9601d7878b5

github-actions Bot added a commit that referenced this pull request Jun 12, 2026
…#18552)

## Overview

The OpenAI integration captured the raw chat-completion request `kwargs` as LLM span metadata, including the openai SDK's `Omit` / `NotGiven` sentinel objects used as defaults for unset parameters. These were serialized to noisy repr strings such as `"<openai.Omit object at 0x7f5e35900e90>"` across most of the ~21 metadata keys per span, making the metadata field unqueryable and burying the real parameters.

## Motivation

Frameworks like **PydanticAI** forward every chat-completion parameter explicitly, defaulting any the caller didn't set to `openai.omit`. ddtrace snapshots the `kwargs` at the wrapper boundary — upstream of the openai SDK's own sentinel-stripping (`_merge_mappings`) — so the sentinels reach span metadata even though they never reach the provider. The request to OpenAI/Azure was always correct; only the recorded metadata was polluted.

## Change

Filter `Omit` / `NotGiven` sentinel values out before building metadata, in both:
- `get_metadata_from_kwargs` (chat / completion)
- `openai_get_metadata_from_response` (responses API)

Sentinel types are resolved **lazily and independently** via a small cached helper. Lazy resolution avoids a circular import while ddtrace is patching openai at import time; independent resolution keeps `NotGiven` filtering working on `openai<2` (which has no `Omit`). On openai-less installs the helper returns an empty tuple and the filter is a no-op.

## Testing

- Added regression test `test_chat_completion_filters_openai_sentinel_metadata` — passes a real value (`top_p`) alongside `Omit`/`NotGiven` sentinels and asserts only the real value lands in metadata. Fails without the fix.
- Verified across the full openai riot matrix (latest, `<2.0.0`, `~=1.76.2`, `==1.66.0`) — including 1.x, which exercises the `NotGiven`-only path.
- Reproduced the customer's exact stack (PydanticAI → Azure OpenAI): metadata drops from 21 keys (~20 sentinels) to only real values.

## Risk

Low. The change only removes sentinel placeholder values from metadata; real parameter values are unaffected. Behavior is unchanged for non-openai integrations.

Jira: MLOB-7613

Co-authored-by: jessica.gamio <jessica.gamio@datadoghq.com>
(cherry picked from commit faf70d1)

Co-authored-by: Jessica Gamio <52049720+jessicagamio@users.noreply.github.com>
Yun-Kim pushed a commit that referenced this pull request Jun 12, 2026
… [backport 4.11] (#18604)

Backport #18552 to 4.11

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jessica Gamio <52049720+jessicagamio@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants